-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change sentinel in find(first|next|prev|last) to nothing
#25472
Conversation
base/array.jl
Outdated
@@ -1547,15 +1546,15 @@ julia> findfirst(A) | |||
2 | |||
|
|||
julia> findfirst(falses(3)) | |||
0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe do findfirst(...) == nothing
giving true
, that will be more explicit? Same below.
base/array.jl
Outdated
``` | ||
""" | ||
findlast(testf::Function, A) = findprev(testf, A, endof(A)) | ||
|
||
zero_sentinel(i) = i === nothing ? 0 : i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this is just coalesce(i, 0)
, so probably not worth defining a new function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some bootstrap issues (can't use coalesce
in files in Core
), but yes overall this is better.
base/bitset.jl
Outdated
@@ -68,14 +68,14 @@ function _bits_findnext(b::Bits, start::Int) | |||
# start is 0-based | |||
# @assert start >= 0 | |||
_div64(start) + 1 > length(b) && return -1 | |||
unsafe_bitfindnext(b, start+1) - 1 | |||
zero_sentinel(unsafe_bitfindnext(b, start+1)) - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not keep unsafe_bitfindnext
returning 0
for no match, since it's internal? Wouldn't that involve less changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's one of each; I decided to change it for greater consistency. But indeed I followed this advice for other internal methods.
throw(BoundsError(s, i)) | ||
end | ||
@inbounds isvalid(s, i) || string_index_err(s, i) | ||
c = pred.x | ||
c ≤ '\x7f' && return _search(s, c % UInt8, i) | ||
c ≤ '\x7f' && return nothing_sentinel(_search(s, c % UInt8, i)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really useful to change _search
too, given that it's internal?
EDIT: actually, I read it backwards. Shouldn't _search
be changed too? It appears to be always used with nothing_sentinel
(which could be removed since it's only used in this file AFAICT). BTW, looks like you missed this method:
Line 265 in bf424d7
findnext(t::AbstractString, s::AbstractString, i::Integer) = _search(s, t, i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't that one always return a range? _search
caused me more headaches than any other here, so I could be misunderstanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. Though my first remark about changing _search
still holds I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but I don't feel like doing it 😄. In truth this is what I tried first, spent half a day just trying to get Julia to build, and then punted and started from scratch. It's possible I was stymied by something very trivial, but at this stage even half a day is not worth it for an internal interface that can be changed later.
I think the problem is that the range-returning version calls the scalar-returning version and I don't know this part of Julia well enough to untangle it without investing more time than I can afford right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that feeling too, that code is complex.
Let's leave it that way, but please move nothing_sentinel
to this file since it isn't used elsewhere.
@@ -1114,7 +1114,7 @@ timesofar("nnz&find") | |||
b1 = trues(v1) | |||
b2 = falses(v1) | |||
for i = 1:v1 | |||
@test findprev(b1, i) == findprev(b1, true, i) == findprev(identity, b1, i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not test findprev(equalto(true), b1, i)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same codepath is covered by identity
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, given findnext(pred::EqualTo, B::BitArray, start::Integer)
now exists.
@@ -125,7 +125,7 @@ function invpermute!!(a, p::AbstractVector{<:Integer}) | |||
count = 0 | |||
start = 0 | |||
while count < length(a) | |||
start = findnext(!iszero, p, start+1) | |||
start = findnext(!iszero, p, start+1)::Int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we actually sure all AbstractArray
implementations will return an Int
? You could also try using notnothing(...)
, which throws when nothing
is passed to it, which should allow the compiler to assert the type (need to check that).
BTW, how did you find the places where this assertion can be added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyplace where the code didn't seem to check for a 0 return but then indexed it, I assumed it was guaranteed to succeed.
base/bitarray.jl
Outdated
@@ -1540,18 +1541,18 @@ function unsafe_bitfindprev(Bc::Vector{UInt64}, start::Integer) | |||
end | |||
end | |||
end | |||
return 0 | |||
return nothing | |||
end | |||
|
|||
# returns the index of the previous non-zero element, or 0 if all zeros |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"or nothing
"
base/bitarray.jl
Outdated
@@ -1459,13 +1459,13 @@ function unsafe_bitfindnext(Bc::Vector{UInt64}, start::Integer) | |||
end | |||
end | |||
end | |||
return 0 | |||
return nothing | |||
end | |||
|
|||
# returns the index of the next non-zero element, or 0 if all zeros |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"or nothing
"
bf424d7
to
8deab51
Compare
New version. |
This seems a safer than 0 and is infinitely better for offset arrays! :) This was said in another forum but I’ll say it one time more - this seems like the kind of condition that can only be dealt with properly when the caller explicitly checks the result. To me this is a textbook case of using something like Just thoughts - don’t let this slow progress on |
@andyferris Better continue the discussion about |
@nalimilan On one hand, I completely agree, let's just move forward with a simple thing which is correct for arrays (including offset arrays).
On the other hand, the underlying thing which bothers me here and in JuliaLang/Juleps#47 is that we discuss this "inconvenience" in a strange way - even for arrays, it is absolutely necessary for the caller to do something to deal with the fact that the index might not be found. Whichever way we disguise it (with sentinel values, union types, wrapping types, etc), the caller really needs to check some boolean condition whether a matching index was found or not. My judgement is that wrapping the return in a But yes, perhaps this is something better discussed elsewhere! Like I tried to say earlier, I only want to write this for the record on JuliaLang/julia for future reference (say, for v2.0 development), rather than do anything to block this PR (which is clearly better than the status quo, and therefore quite acceptable). |
f6d7053
to
d195c91
Compare
OK, barring more objections I think this is ready. Let's first get a performance test: @nanosoldier |
Thanks Tim! BTW, I think you missed my question at #24774 (comment), which is now hidden. |
Nanosoldier is hitting a |
Unfortunately, it's going to be difficult to support both the new and the old API. I guess we'll need to add a |
I see slight regressions in julia> @btime perf_findnext($x) # length-1000 Bool vector
1.234 μs (0 allocations: 0 bytes) vs julia> @btime perf_findnext($x)
837.929 ns (0 allocations: 0 bytes) There are a couple of apparent big regressions in what appears to be completely unrelated code (e.g., a |
I'm not seeing any complaints about that regression. Any objections to merging? If not, I'll fix the conflicts. |
d195c91
to
b6bd419
Compare
I just noticed that
My overall vote (at the moment) is the last option: removing |
I had noticed that too, and I think we should return |
This needs a NEWS entry |
Closes JuliaLang/Juleps#47. One could discuss returning a
NotFound
value instead.